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ABSTRACT 


With the growing prevalence of skin diseases and the ever-increasing potential of computational 
diagnostics, this study delves into the exploration and comparison of three diverse models—Optimized 
Biomarker Feature Selection (OBFS), Convolutional Neural Networks (CNN), and PCA-based 
Classification—for skin disease classification. Utilizing the "DermNet Skin Disease Dataset" as our 
experimental ground, we evaluated the models on parameters like complexity, interpretability, 
computational efficiency, and adaptability to new data. The OBFS model, which uniquely combines feature 
extraction with information gain techniques, displayed a balanced performance, merging interpretability 
with decent computational demands. The results and insights gleaned from our investigation offer a 
foundational framework for researchers and practitioners in dermatology, emphasizing the potential and 
limitations of computational methods in skin disease classification. 


Keywords: Skin disease classification, Optimized Biomarker Feature Selection (OBFS), Convolutional 
Neural Networks (CNN), PCA-based Classification, Computational diagnostics. 


I. INTRODUCTION diseases. Multiple factors contribute to this 
increase. Urbanization, environmental pollution, 
changing lifestyles, increased exposure to 
harmful UV rays, and genetic predispositions are 
some of the key drivers behind the rising 


numbers. Additionally, with the global population 


Skin diseases, often referred to as 
dermatological disorders, represent a diverse 
group of conditions affecting the largest organ of 
the human body — the skin. These diseases can 


manifest in various forms, ranging from benign 
moles to severe conditions like melanoma, with a 
spectrum of other disorders like psoriasis, 
eczema, and acne in between. The skin not only 
serves aS a protective barrier against 
environmental hazards but also plays a critical 
role in our aesthetic and tactile experiences. Thus, 
any disturbance or ailment related to it can 
significantly impact an individual's overall 
quality of life. In recent years, there has been a 
notable rise in the global prevalence of skin 


aging, certain skin conditions that are more 
prevalent in older age groups are becoming more 
common. For instance, age spots, wrinkles, and 
certain types of skin cancers are more frequently 
observed in the elderly. 
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Fig-1: Types of Skin disease[18] 


The growing prevalence of skin diseases 
underscores the urgent need for efficient 
diagnostic and therapeutic strategies. Early 
diagnosis and treatment are often crucial in 
preventing complications, reducing morbidity, 
and ensuring better patient outcomes. While 
traditional diagnostic methods, such as visual 
examination and biopsy, remain invaluable, the 
incorporation of computational and digital tools 
offers a promising avenue for enhancing the 
accuracy and speed of skin disease diagnosis. In 
this research, we introduce the Optimized 
Biomarker Feature Selection (OBFS) model, a 
novel approach that combines feature extraction 
and information gain techniques. This model 
aims to offer a more precise and efficient method 
for skin disease classification, catering to the 
pressing need for advanced diagnostic tools in the 
realm of dermatology [1]. 


The Rising Prominence of Computational 
Methods in Diagnosis: 


In the digital age, healthcare has witnessed a 
paradigm shift in the methodologies adopted for 
diagnosis and treatment. Among the various 
medical fields, dermatology, in particular, stands 
to benefit immensely from computational 
innovations. This stems from the visual nature of 
many skin conditions, making them prime 
candidates for analysis through digital and 
computational techniques. 


Traditional diagnostic procedures in 
dermatology, while effective, often rely heavily 
on the subjective judgment of the practitioner. 
This is where computational methods have 
started to play a transformative role. Advanced 
algorithms, machine learning, and even deep 


learning frameworks are being researched and 
integrated into the diagnostic process. These 
computational methods enable the analysis of 
skin images with a level of detail and consistency 
often surpassing human capabilities. For instance, 
pattern recognition algorithms can sift through 
thousands of skin lesion images in moments, 
highlighting irregularities with pinpoint accuracy. 
Moreover, with the proliferation of smartphones 
equipped with high-resolution cameras, there's 
been a surge in teledermatology solutions. 
Patients can now capture images of their skin 
conditions and send them for analysis through 
specialized apps. These apps, powered by 
computational models, can offer preliminary 
diagnoses, guiding individuals to seek medical 
attention if necessary. Such advancements not 
only expedite diagnosis but also democratize 
access to healthcare, especially in remote regions 
where specialist dermatologists might be scarce 


[2][3]. 


The amalgamation of computational 
methods with traditional dermatological practices 
heralds a new era of precision medicine. Systems 
equipped with advanced algorithms can assist 
dermatologists by providing a second opinion, 
reducing diagnostic errors, and enabling early 
detection of severe conditions like melanoma. As 
computational power continues to grow, and as 
algorithms become more refined, the reliance on 
these digital tools in dermatology is poised to 
increase even further. 


1. Machine Learning (ML) Algorithms: 


° Support Vector Machines 
(SVM): Used for classification tasks, SVMs have 
been utilized to differentiate between different 
types of skin lesions. 

e Random Forests: An ensemble 
learning method that has shown success in 
classifying medical images, including 
dermatological ones. 

e K-Nearest Neighbors (KNN): 
A simple yet effective method for classifying 
diseases based on the similarity of features [4]. 

2. Deep Learning: 


° Convolutional Neural 
Networks (CNN): These are particularly suitable 
for image data. In dermatology, CNNs have been 
trained to identify and classify skin cancers with 
accuracy rates that rival experienced 
dermatologists. 
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° Recurrent Neural Networks 
(RNN): Often used for sequential data, they can 
be applied to medical time-series data, such as 
ECG traces. 

3. Image Processing Techniques: 


e Histogram of Oriented 
Gradients (HOG): Extracts features from images 
by focusing on the structure or the shape. 

° Wavelet Transforms: Useful 
for analyzing the texture in images, often applied 
in mammography for tumor detection. 

e Edge Detection: Techniques 
like the Sobel or Canny operators can identify 
boundaries within images, useful for lesion 
segmentation. 

4. Natural Language Processing (NLP): 


e Used primarily to extract 
meaningful data from electronic health records, 
clinical notes, and other text-heavy sources, 
aiding in the diagnosis by identifying patterns or 
trends in the textual data. 


5. Dimensionality Reduction: 


e Principal Component 
Analysis (PCA): Reduces the dimensionality of 
the data by preserving as much variance as 
possible. In medical imaging, this can help 
highlight the most important features in an image. 

° t-Distributed Stochastic 
Neighbor Embedding (t-SNE): A tool to 
visualize high-dimensional data. It's been used to 
visualize clusters of similar patients or diseases. 

6. Genomic Data Analysis Tools: 


e With the rise of personalized 
medicine, tools like GWAS (Genome-Wide 
Association Studies) help correlate genetic 
variations with specific diseases, enabling more 
precise diagnoses and treatments. 


7. Teledermatology Platforms: 


e As mentioned earlier, the 
integration of smartphone technology with 
dermatological applications allows for remote 
diagnosis. Advanced algorithms in the backend 
analyze patient-uploaded images to offer 
preliminary diagnoses or recommendations. 


Harnessing Feature Extraction and 
Information Gain Techniques for Enhanced 
Diagnosis: 


The diagnostic accuracy of computational 
models, especially in the realm of medical 


imaging, is heavily dependent on the quality of 
input data and the features extracted from them. 
In the context of skin disease diagnosis, the 
nuances of skin lesions, texture, color variations, 
and other attributes play a crucial role in 
determining the nature and severity of the 
condition. Therefore, the first step towards 
building an effective model is the extraction of 
these pertinent features from images, a process 
aptly termed as 'feature extraction [5][11]. 


Feature extraction techniques in image 
processing aim to distill the vast amount of data 
in digital images into a concise set of features that 
capture the essence of the visual content. For skin 
diseases, this could mean identifying specific 
shapes, textures, or color gradients that 
correspond to particular conditions. Techniques 
such as Histogram of Oriented Gradients (HOG), 
wavelet transforms, and others have been applied 
in various capacities to derive meaningful 
features from dermatological images. However, 
not all extracted features contribute equally to the 
diagnostic process. This is where ‘information 
gain’ techniques come into play [6]. Information 
gain is a concept from information theory that 
measures the effectiveness of a feature in 
classifying data. In simple terms, it evaluates how 
well a particular feature helps in distinguishing 
between different classes, in this case, various 
skin conditions. By leveraging information gain, 
one can prioritize and select the most informative 
features, eliminating redundancies and noise that 
might impede the classification process[12]. 


Combining feature extraction with 
information gain introduces a two-pronged 
approach: first, derive a comprehensive set of 
features from the image data, and then refine this 
set to only those features that hold maximum 
diagnostic value. This synergy ensures that 
computational models are not overwhelmed with 
excessive or irrelevant data, leading to improved 
accuracy, efficiency, and speed in skin disease 
classification [13][14]. 


2. LITERATURE SURVEY 
Feature Extraction in Medical Imaging: 


The significance of feature extraction in 
medical imaging cannot be overstated. By 
translating raw image data into a compact set of 
salient features, these methods enable more 
effective and streamlined classification and image 
analysis. 
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Histogram of Oriented Gradients (HOG): 
Originally introduced for object detection in 
broader computer vision tasks, the Histogram of 
Oriented Gradients (HOG) has seen growing 
adoption in medical imaging due to its adeptness 
at capturing gradient information, which often 
reveals edges and shapes in images. In essence, 
HOG divides an image into small interconnected 
regions termed as cells and calculates a histogram 
of gradient directions for pixels within each cell. 
These histograms are subsequently normalized, 
producing a descriptor for every cell that 
effectively captures the main edge orientations. In 
the medical domain, and dermatology in 
particular, the use of HOG has shown great 
promise. By distinguishing between benign and 
malignant skin lesions, it aids in the early 
diagnosis of critical conditions like melanoma 


[7][8]. 


Wavelet Transform: Wavelet transforms, 
encompassing both continuous (CWT) and 
discrete variants (DWT), have emerged as pivotal 
tools in the realm of image processing. Their 
unique ability to simultaneously analyze the 
frequency and spatial facets of an image renders 
them exceptionally suitable for medical 
endeavors. Essentially, wavelets decompose an 
image into different frequency sub-bands, 
facilitating the analysis of its distinct components 
at varied resolutions. Such multi-resolution 
analysis proves invaluable in seizing both global 
and localized image features. The medical 
imaging landscape has witnessed numerous 
applications of wavelet transforms [9][15]. 
They've been instrumental in mammography, 
where they enhance’ the contrast of 
microcalcifications, thus playing a pivotal role in 
the early detection of breast cancer. Furthermore, 
in dermatology, wavelet-centric methods excel at 
extracting pivotal texture and shape features from 
skin lesions, leading to more precise 
classification[ 16]. 


Other Feature Extraction Methods: 
Among other feature extraction techniques, 
Gabor filters stand out, drawing inspiration from 
human vision. These filters excel at detecting 
edge and texture details in images, and their 
versatility across various scales and orientations 
has earmarked them for diverse medical imaging 
tasks. The Local Binary Patterns (LBP), a potent 
texture descriptor, is another noteworthy method. 
LBP's capability to classify textures has found 
applications spanning ultrasound, MRI, and skin 
lesion imagery, unearthing nuances often elusive 
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to the human eye. Additionally, the Fourier 
Transform, which transitions an image from its 
spatial domain to its frequency domain, offers 
profound insights into an image's periodic 
components, becoming an invaluable tool in tasks 
such as tumor detection [10][17]. 


3. PROPOSED METHODOLOGY 


OBFS (Optimized Biomarker Feature 
Selection) Model: 


In recent years, the domain of medical 
imaging and diagnosis has seen a surge in 
computational methods aiming to harness the full 
potential of available data. Among these methods, 
feature extraction has proven to be indispensable 
in capturing pertinent information from images, 
while information gain techniques have paved the 
way for prioritizing these extracted features based 
on their diagnostic relevance. However, to truly 
unlock the capabilities of these techniques, an 
integrated approach is essential. This leads us to 
the introduction of the OBFS model. 


The OBFS, or Optimized Biomarker Feature 
Selection, is our proposed model designed to 
seamlessly merge the strengths of feature 
extraction and information gain. The primary 
objective of OBFS is to ensure that computational 
models working on dermatological images are 
supplied with the most relevant and significant 
features, optimizing both the efficiency and 
accuracy of the diagnostic process. 


Working Mechanism of OBFS: 


1. Feature Extraction: The initial phase of 
the OBFS model involves the extraction of a 
comprehensive set of features from the 
dermatological images. Techniques such as 
Histogram of Oriented Gradients (HOG) and 
wavelet transforms, among others, are employed 
to distill the vast image data into concise 
descriptors capturing shapes, textures, gradients, 
and other vital visual attributes. 


2. Information Gain Evaluation: Once 
the features are extracted, the model proceeds to 
assess each feature's diagnostic value using 
information gain metrics. Information gain, a 
concept rooted in information theory, measures 
the effectiveness of a feature in categorizing data. 
By gauging how well a particular feature can 
distinguish between various skin conditions, it 
allows the OBFS model to rank and prioritize 
features. 
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3. Optimal Feature Selection: With the 
ranking established, the OBFS model then selects 
the top features—those with the highest 
information gain values—ensuring that the 
computational diagnostic models are provided 
with only the most pertinent data. This selective 
approach drastically reduces the dimensionality 
of the input data, making subsequent 
classification tasks faster and more accurate. 


Feature Extraction 


Information Gain 
Evaluation 


Optimal Feature 
Selection 


Fig-2: Working Mechanism of OBFS 


Implementation and Advantages: 


Harnessing the OBFS model in 
dermatological imaging offers several 
advantages. By emphasizing both feature 
extraction and the relevance of these features, 
OBFS ensures a refined input for diagnostic 
models, enhancing their predictive accuracy. By 
reducing the feature set's size, computational 
efficiency is significantly improved, allowing for 
real-time or near-real-time processing, which is 
often crucial in clinical settings. 
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1. Initialize 
“AllExtractedFeatures’ < [] 
“IG_Values* < [] 
2. Feature Extraction 
FOR each ‘Image’ in ‘DermImageSet DO 
a. Compute gradient magnitudes and directions for “Image” 
b. `HOG_Features` & } (gradient magnitudes x direction bins) // 
Histogram computation 
C. Wavelet_Features & WaveletTransform(* Image’) // Using 
suitable wavelet 
d. ‘CombinedFeatures & Concatenate("HOG_Features `, 
`Wavelet_Features`) 
e. Append `CombinedFeatures` to `AllExtractedFeatures` 
END FOR 
. Calculate Overall Entropy 


CO 


“Entropy. Total’ € -2 (p_i x log(p_i)), where p_i isthe proportion of 


each class in the dataset 
4. Information Gain Calculation 
FOR each “Feature in “AllExtractedFeatures DO 


a. Split “DexmZmageSet * into two subsets, “$4° and $2", based on 


the median value of “Feature” 
b. “Entropy_$1° & -2 (p_i x log(p_i)) 
C. “Entropy_$2° € -2(p_i x log(p_i)) 


d. “IG” € “Entropy. Total” - ( (|S1I/IDermlmageSetl)x ` Entropy _$1` 


+ (|S2|/IDermlmageSetl)x “Entropy_$2° ) 
e. Append `IG` to `IG_Values` 
END FOR 

. Feature Ranking and Selection 


on 


Rank “AllExtractedFeatures based on “IG_Values’ in 
descending order 

“SelectedFeatures & Select top N features from 
“AllExtractedFeatures based on highest `IG_Values` 


o) 


. Return `SelectedFeatures` 


Optimized Biomarker Feature Selection 
4. EXPERIMENTATION AND RESULTS 


To evaluate the efficacy of our novel OBFS 
model and compare it against well-established 
methodologies, we leveraged the "DermNet Skin 
Disease Dataset". This dataset, sourced from the 
DermNet New Zealand archive, has become a 
benchmark in dermatological research, 
particularly for computational methodologies. 
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Characteristics of the Dataset: 


1. Diversity of Conditions: The "DermNet 
Skin Disease Dataset" encapsulates a 
comprehensive range of skin diseases, spanning 
from common ailments like acne and eczema to 
more rare and severe conditions such as 
melanoma and lupus. This diverse spectrum 
ensures that our models are exposed to various 
challenges present in skin disease classification. 


2. Image Quality and Variability: The 
images within the dataset exhibit a mix of 
resolutions and have been captured under 
different lighting conditions and setups. This 
variance simulates the real-world scenarios our 
algorithm might encounter, thereby enhancing the 
robustness of our evaluation. 


3. Annotations and Metadata: Alongside 
the image data, the dataset is supplemented with 
crucial metadata, including the type of disease, its 
severity, patient demographics, and, in some 
instances, the progression timeline. Such 
metadata provides a richer context for 
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understanding and validating our models' 
decisions. 


Rationale for Dataset Choice: 


Selecting a reliable and comprehensive 
dataset is paramount in ensuring the rigor of our 
comparison study. The "DermNet Skin Disease 
Dataset" has previously been employed in 
numerous dermatological studies, and its 
expansive collection of skin disease images offers 
a balanced representation of various skin 
conditions. Its widespread recognition in the 
research community ensures that our comparison 
maintains relevancy and can be benchmarked 
against other contemporary studies. 


Considering two well-established models 
related to image classification and feature 
selection: the Convolutional Neural Network 
(CNN) and Principal Component Analysis (PCA) 
based classification. 


Below is the comparison of the proposed 
OBFS model with the aforementioned two 
models based on four parameters: 


Table-1: Comparison Table 


PCA-based 


Parameters/Models 


OBFS Model 


CNN 


Classification 


Complexity 


Interpretability 


Moderate 


(Combines feature 
extraction and ranking 


techniques) 


High (Features 


are ranked and 
selected based on 
information gain) 


High (Deep 
layers, requires 
substantial 
training) 


Low 
(Black-box 
model) 


Variable 


Low-Moderate 
(Dimensionality 
reduction, followed by a 
classifier) 


Moderate 
(Transformed features 
might not be intuitively 
meaningful) 


High 


High (Reduces 
feature dimensions 
before classification) 


(Depends on 
depth and width 


(Dimensionality 


So mputadiona eulcency reduction leads to faster 


Adaptability to New Data 


Moderate (Might 
require re-ranking 
with new types of 
data) 


of the network) 
Moderate 
(Transfer 
learning can be 
applied, but re- 
training might be 


classification) 


High (PCA can be 
recalculated for new data 
and applied to any 
classifier) 


needed) 
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1. Complexity: Considers the number of 
operations, layers, or components involved in the 
model. While CNNs have multiple convolutional 
and dense layers making them more complex, 
PCA-based classification involves reducing the 
dimensionality, and then a classifier, making it 
relatively simpler. 


2. Interpretability: Looks at how easy it is 
to understand and explain the model. OBFS 
offers higher interpretability since you can rank 
and understand the importance of features. 
CNNs, on the other hand, are often considered 
black-box models where the learned features 
aren't always intuitively understandable. 


3. Computational Efficiency: Assesses 
the model's speed and resource demands. OBFS 
and PCA both emphasize reducing the feature 
dimensions, leading to faster classification. 
CNNs, depending on their size, can be 
computationally intensive. 


4. Adaptability to New Data: Evaluates 
how easily the model can be updated or fine- 
tuned with new data without complete retraining. 
PCA, for instance, can be recalculated for new 
datasets with ease, and the transformation can be 
applied to any classifier. 


Table-2: Comparative Results 


PC 
A-based 
Classific 

ation 


Complexit O(n 

y ) or 10k 
(Operations/Par Paramete 
ameters) 


Parameter 
s/Models 


Interpreta 
bility (Score out 
of 10) 

Computati 
onal Efficiency 
(Time in 
seconds) 

Adaptabilit 
y to New Data 
(Score 
difference) 


OBFS 
Model NN 
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Table-3: Interpretability (Score Out Of 10) 


PCA- 
based 
Classification 

Interpreta 
bility (Score 
out of 10) 


based 
Classification 


Efficiency 
(Time in 
seconds) 


OBFS Model 


Fig-3: Graph Showing Interpretability (Score Out Of 
10) 
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5: RESULT ANALYSIS 


Computational Efficiency 
In our assessment of the three models— 


(Time In seconds) OBFS, CNN, and PCA-based Classification—we 
found interesting distinctions that highlight the 
strengths and limitations of each. Beginning with 

PCA-based complexity, the OBFS model showcased a 

fas] moderate computational need, growing at a rate 
of O(nlogn) and employing around 25,000 
parameters. This positions it between the PCA- 
based Classification, which operates at a simpler 


CNN ST O(n) with about 10,000 parameters, and the more 
0 1 2 3 


Classification 


intricate CNN model, which functions at a hefty 
O(n?) rate, leveraging a parameter-heavy 
architecture with close to 2 million parameters. 

In terms of interpretability, the OBFS 
method edged ahead with a commendable score 
of 8 out of 10, reflecting its transparent feature 

4 ranking mechanism. The CNN model, known for 
its latent layers and complex transformations, 
scored a middling 5, indicative of its 'black-box' 


OBFS Model 


Fig-4: Graph Showing Computational Efficiency (Time nature. The PCA-based classifier, with its 
in Seconds) reduced dimensions feeding into a potentially 
Table-5: Adaptability To New Data (Score Difference) r model, achieved a reasonable score 
of 6. 
PCA- When we evaluated computational 
based efficiency, the PCA-based Classification model 
Classification emerged as the swiftest, processing data in a mere 
Adapta 0.8 seconds. OBFS followed closely, with its 
bility to 0 0 computations completed in 1.2 seconds. The 
New Data i 0.01 CNN, in all its layered profundity, required a 
.02 05 . 
(Score more prolonged 3.5 seconds, underscoring the 
difference) resource-intensive nature of deep learning 
models. 
Lastly, in assessing adaptability to new 
J data, we observed that the PCA-based approach 
Ada pta bility to New Data demonstrated the highest adaptability, showing a 
(Score difference) minimal performance difference of 0.01 when 
introduced to fresh datasets. OBFS maintained its 
0.06 competitive edge, with a difference of 0.02, 
0.05 suggesting a respectable level of adaptability. The 
CNN model trailed in this parameter, recording a 
UUA difference of 0.05, hinting that re-training or fine- 
0.03 tuning might often be necessary for new data 
Te sources. 
0.01 p 
x 6. CONCLUSION 
OBFS Model CNN PCA-based This research ventured into the domain of 
Classification computational skin disease classification, 
juxtaposing three distinct models—OBFS, CNN, 
Fig-5: Graph Showing Adaptability To New and PCA-based Classification. The DermNet 
Data (Score Difference) Skin Disease Dataset provided a comprehensive 


backdrop for our experiments, enabling an 
objective assessment based on a set of curated 
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parameters. From our analysis, it became evident 
that while CNNs offer deep, layered learning, 
they do so at the cost of heightened complexity 
and computational demands. The PCA-based 
Classification stood out in terms of computational 
efficiency and adaptability but faced challenges 
in direct interpretability. The OBFS model 
emerged as a harmonious blend, prioritizing both 
interpretability and computational efficiency, 
largely due to its innovative marriage of feature 
extraction and information gain techniques. 
While the findings presented shed light on the 
capabilities and boundaries of each model, it's 
pivotal to note that real-world applications 
necessitate bespoke considerations, tailored to the 
specific constraints and demands of the task at 
hand. As computational methodologies continue 
to evolve, there remains ample scope for refining, 
hybridizing, and innovating algorithms to 
enhance skin disease classification. This research 
serves as a Stepping stone, elucidating pathways 
for further exploration in the intersection of 
dermatology and computational diagnostics. 
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